Bioinformatics (Thomas Dandekar, Meik Kunz)

130

For example, if I originally have a GC base pairing in my molecule, I could also use an

AU, UA, or CG base pair for it, but all other nucleotide exchanges no longer result in

strong base pairings (a “weak” GU or UG pair can help transition from one stable pair to

another). These compensatory base pairings within a molecule happen somewhat more

easily, but everything else happens over time, so the U4 RNA changes in structure depend

ing on the organism, and interacting partner molecules also change to a greater or lesser

extent. In our example, this is in particular the catalytically active, mRNA-splicing RNA

U6, which is initially kept inactive by the U4 RNA because the U4 RNA fits like a cap on

the U6 RNA (this structure was called the Y model because of its shape).

By analyzing many U4 RNA structures in this example, we can see how evolution

works. Thus, one can see how first (over short periods of time, in closely related organisms)

single mutations change the sequence already in a short time and then over longer periods

of time (in more distantly related organisms) the structure also changes, perhaps even new

partner molecules are found or simply the gene doubles so that the second copy can per

form a completely new function and mutates more easily. Evolution by mutation and selec

tion of mutations with adaptive advantage can be traced in detail by RNA structure analysis.

The comparison of the RNA structure in many organisms helps in this process.

10.4

Describing Evolution: Phylogenetic Trees

To do this, one only has to calculate phylogenetic trees for a widespread gene, i.e. on the

one hand see which organisms are closely or distantly related according to their sequence

and also try to work out the earlier branchings and precursor molecules. Although these

are very rarely actually handed down (only if, for example, the already extinct mammoth

can be thawed from the ice and re-sequenced), the information about the precursor mole

cules is hidden in the existing sequences. In this context, bioinformatics allows us to work

out the precursors. There are several ways to do this. The easiest to calculate is the neigh

bour joining method. Here, one first sorts the molecules that one wants to connect in the

phylogenetic tree according to their similarity and then always calculates the respective

ancestors for direct neighbours.

A somewhat more elaborate procedure is “parsimony”, i.e. starting similarly, but calcu

lating the mostly not directly observable ancestors of today’s molecules in such a way that

one can generate all observed today’s sequences with as few mutations of these precursor

sequences as possible. This reflects the actual conditions surprisingly well, because each

individual mutation is very rare. A phylogenetic tree that introduces an unnecessarily large

number of mutations is therefore a priori less likely than a phylogenetic tree that manages

with as few mutations as possible.

It stands to reason that a pedigree that does not simply consider the most exact proba

bilities possible for the ancestors, but calculates them for each individual mutation, is the

most accurate. This can be done by means of the so-called maximum-likelihood method,

i.e. the calculation of the most probable path for all mutations. For this, one has to estimate

10 Understand Evolution Better Applying the Computer